Behavior of Consolidated Trees when using Resampling Techniques

نویسندگان

  • Jesús M. Pérez
  • Javier Muguerza
  • Olatz Arbelaitz
  • Ibai Gurrutxaga
  • José Ignacio Martín
چکیده

Many machine learning areas use subsampling techniques with different objectives: reducing the size of the training set, equilibrate the class imbalance or non-uniform cost error, etc. Subsampling affects severely to the behavior of classification algorithms. Decision trees induced from different subsamples of the same data set are very different in accuracy and structure. This affects the explanation of the classification; very important in some domains. This paper presents a new methodology for building decision trees. The final classifier is a single decision tree, so that it maintains the explaining capacity of the classification. A comparison in error and structural stability of our algorithm and the C4.5 algorithm is done. The decision trees generated using the new algorithm, achieve smaller error rates and structurally more steady trees than C4.5 when using subsampling techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Consolidated Trees: An Analysis of Structural Convergence

When different subsamples of the same data set are used to induce classification trees, the structure of the built classifiers is very different. The stability of the structure of the tree is of capital importance in many domains, such as illness diagnosis, fraud detection in different fields, customer’s behaviour analysis (marketing), etc, where comprehensibility of the classifier is necessary...

متن کامل

The Effect of the Used Resampling Technique and Number of Samples in Consolidated Trees’ Construction Algorithm

In many pattern recognition problems, the explanation of the made classification becomes as important as the good performance of the classifier related to its discriminating capacity. For this kind of problems we can use Consolidated Trees ́ Construction (CTC) algorithm which uses several subsamples to build a single tree. This paper presents a wide analysis of the behavior of CTC algorithm for ...

متن کامل

A new algorithm to build consolidated trees: study of the error rate and steadiness

This paper presents a new methodology for building decision trees, Consolidated Trees Construction algorithm, that improves the behavior of C4.5. It reduces the error and the complexity of the induced trees, being the differences in the complexity statistically significant. The advantage of this methodology in respect to other techniques such as bagging, boosting, etc. is that the final classif...

متن کامل

Consolidated Tree Construction Algorithm: Structurally Steady Trees

This paper presents a new methodology for building decision trees or classification trees (Consolidated Trees Construction algorithm) that faces up the problem of unsteadiness appearing in the paradigm when small variations in the training set happen. As a consequence, the understanding of the made classification is not lost, making this technique different from techniques such as bagging and b...

متن کامل

Selecting Multiway Splits in Decision Trees

Decision trees in which numeric attributes are split several ways are more comprehensible than the usual binary trees because attributes rarely appear more than once in any path from root to leaf. There are efficient algorithms for finding the optimal multiway split for a numeric attribute, given the number of intervals in which it is to be divided. The problem we tackle is how to choose this n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004